8 research outputs found

    Preserving Privacy in High-Dimensional Data Publishing

    Get PDF
    We are witnessing a continuous expansion of information technology that never ceases to impress us with its computational power, storage capacity, and agile mobility. Such technology is becoming more pervasive by the day and has enhanced various aspects of our daily lives. GPS-equipped devices, smart card automated fare collection systems, and sensory technology are but a few examples of advanced, yet affordable, data-generating technologies that are an integral part of modern society. To enhance user experience or provide better services, service providers rely on collecting person-specific information from users. Thus, the collected data is studied and analyzed in order to extract useful information. It is a common practice for the collected data to be shared with a third-party, e.g., a data mining firm, for data analysis. However, the shared data must not leak sensitive information about the individuals to whom the data belongs or reveal their identity. In other words, individuals’ privacy must be protected in the published data. Privacy-preserving data publishing is a research area that studies anonymizing person-specific data without compromising its utility for future data analysis. This thesis studies and proposes anonymization solutions for three types of high-dimensional data: trajectory streams, static trajectories, and relational data. We demonstrate through theoretical and experimental analysis that our proposed solutions, for the most part, outperform state-of-the-art methods in terms of utility, efficiency, and scalability

    Preserving Data Privacy and Information Usefulness for RFID Data Publishing

    Get PDF
    Radio-Frequency IDentification (RFID) is an emerging technology that employs radio waves to identify, locate, and track objects. RFID technology has wide applications in many areas including manufacturing, healthcare, and transportation. However, the manipulation of uniquely identifiable objects gives rise to privacy concerns for the individuals carrying these objects. Most previous works on privacy-preserving RFID technology, such as EPC re-encryption and killing tags, have focused on the threats caused by the physical RFID tags in the data collection phase, but these techniques cannot address privacy threats in the data publishing phase, when a large volume of RFID data is released to a third party. We explore the privacy threats in RFID data publishing. We illustrate that even though explicit identifying information, such as phone numbers and SSNs, is removed from the published RFID data, an attacker may still be able to perform privacy attacks by utilizing background knowledge about a target victim's visited locations and timestamps. Privacy attacks include identifying a target victim's record and/or inferring their sensitive information. High-dimensionality is an inherent characteristic in RFID data; therefore, applying traditional anonymity models, such as K -anonymity, to RFID data would significantly reduce data utility. We propose a new privacy model, devise an anonymization algorithm to address the special challenges of RFID data, and experimentally evaluate the performance of our method. Experiments suggest that applying our model significantly improves the data utility when compared to applying the traditional K -anonymity model

    Memory Forensics: Recovering Chat Messages and Encryption Master Key

    Get PDF
    © 2019 IEEE. In this pervasive digital world, we are witnessing an era where cybercriminals are improving their abilities in taking advantage of wide-spread digital devices to perform various malicious activities. By utilizing anti-forensic techniques, cybercriminals are able to erase or alter digital evidence that can otherwise be used against them in court. One of the most critical sources of digital evidence that forensic investigators examine is the physical memory of a digital device, i.e., Random Access Memory (RAM). RAM is a volatile memory containing data that might be of significant value to forensic investigation. RAM, which stores data about recent activities, stores data only when the device is powered on. Once the device powers off, all the data stored in the RAM is lost permanently. Forensic investigators find great value in RAM data and thus need to preserve such data without harming the integrity of the collected evidence. Many existing tools provide the ability to acquire and analyze images of the data stored in RAM. This paper tackles the fundamental topic of security, privacy, and digital forensics. Specifically, this paper examines memory dumps of 4GB Windows 7 computers with the objective of identifying an instant messaging tool and recovering its chat messages, and recovering master encryption keys of volumes encrypted by BitLocker and TrueCrypt. Throughout this paper, we utilize two widely-used tools, namely Volatility and WinHex, due to their various functionalities designed specifically for memory forensic investigation

    Forensic Analysis of Microsoft Teams: Investigating Memory, Disk and Network

    Get PDF
    Videoconferencing applications have seen a jump in their userbase owing to the COVID-19 pandemic. The security of these applications has certainly been a hot topic since millions of VoIP users’ data is involved. However, research pertaining to VoIP forensics is still limited to Skype and Zoom. This paper presents a detailed forensic analysis of Microsoft Teams, one of the top 3 videoconferencing applications, in the areas of memory, disk-space and network forensics. Extracted artifacts include critical user data, such as emails, user account information, profile photos, exchanged (including deleted) messages, exchanged text/media files, timestamps and Advanced Encryption Standard encryption keys. The encrypted network traffic is investigated to reconstruct client-server connections involved in a Microsoft Teams meeting with IP addresses, timestamps and digital certificates. The conducted analysis demonstrates that, with strong security mechanisms in place, user data can still be extracted from a client’s desktop. The artifacts also serve as digital evidence in the court of Law, in addition to providing forensic analysts a reference for cases involving Microsoft Teams

    Differentially private multidimensional data publishing

    Get PDF
    © 2017, Springer-Verlag London Ltd., part of Springer Nature. Various organizations collect data about individuals for various reasons, such as service improvement. In order to mine the collected data for useful information, data publishing has become a common practice among those organizations and data analysts, research institutes, or simply the general public. The quality of published data significantly affects the accuracy of the data analysis and thus affects decision making at the corporate level. In this study, we explore the research area of privacy-preserving data publishing, i.e., publishing high-quality data without compromising the privacy of the individuals whose data are being published. Syntactic privacy models, such as k-anonymity, impose syntactic privacy requirements and make certain assumptions about an adversary’s background knowledge. To address this shortcoming, we adopt differential privacy, a rigorous privacy model that is independent of any adversary’s knowledge and insensitive to the underlying data. The published data should preserve individuals’ privacy, yet remain useful for analysis. To maintain data utility, we propose DiffMulti, a workload-aware and differentially private algorithm that employs multidimensional generalization. We devise an efficient implementation to the proposed algorithm and use a real-life data set for experimental analysis. We evaluate the performance of our method in terms of data utility, efficiency, and scalability. When compared to closely related existing methods, DiffMulti significantly improved data utility, in some cases, by orders of magnitude

    Differentially Private Release of Heterogeneous Network for Managing Healthcare Data

    Get PDF
    With the increasing adoption of digital health platforms through mobile apps and online services, people have greater flexibility connecting with medical practitioners, pharmacists, and laboratories and accessing resources to manage their own health-related concerns. Many healthcare institutions are connecting with each other to facilitate the exchange of healthcare data, with the goal of effective healthcare data management. The contents generated over these platforms are often shared with third parties for a variety of purposes. However, sharing healthcare data comes with the potential risk of exposing patients’ sensitive information to privacy threats. In this article we address the challenge of sharing healthcare data while protecting patients’ privacy. We first model a complex healthcare dataset using a heterogeneous information network that consists of multi-type entities and their relationships. We then propose DiffHetNet , an edge-based differentially private algorithm, to protect the sensitive links of patients from inbound and outbound attacks in the heterogeneous health network. We evaluate the performance of our proposed method in terms of information utility and efficiency on different types of real-life datasets that can be modeled as networks. Experimental results suggest that DiffHetNet generally yields less information loss and is significantly more efficient in terms of runtime in comparison with existing network anonymization methods. Furthermore, DiffHetNet is scalable to large network datasets

    SafePath: Differentially-private publishing of passenger trajectories in transportation systems

    Get PDF
    © 2018 Elsevier B.V. In recent years, the collection of spatio-temporal data that captures human movements has increased tremendously due to the advancements in hardware and software systems capable of collecting person-specific data. The bulk of the data collected by these systems has numerous applications, or it can simply be used for general data analysis. Therefore, publishing such big data is greatly beneficial for data recipients. However, in its raw form, the collected data contains sensitive information pertaining to the individuals from which it was collected and must be anonymized before publication. In this paper, we study the problem of privacy-preserving passenger trajectories publishing and propose a solution under the rigorous differential privacy model. Unlike sequential data, which describes sequentiality between data items, handling spatio-temporal data is a challenging task due to the fact that introducing a temporal dimension results in extreme sparseness. Our proposed solution introduces an efficient algorithm, called SafePath, that models trajectories as a noisy prefix tree and publishes ϵ-differentially-private trajectories while minimizing the impact on data utility. Experimental evaluation on real-life transit data in Montreal suggests that SafePath significantly improves efficiency and scalability with respect to large and sparse datasets, while achieving comparable results to existing solutions in terms of the utility of the sanitized data

    Service-oriented architecture for high-dimensional private data mashup

    Get PDF
    Abstract—Mashup is a web technology that allows different service providers to flexibly integrate their expertise and to deliver highly customizable services to their customers. Data mashup is a special type of mashup application that aims at integrating data from multiple data providers depending on the user’s request. However, integrating data from multiple sources brings about three challenges: 1) Simply joining multiple private data sets together would reveal the sensitive information to the other data providers. 2) The integrated (mashup) data could potentially sharpen the identification of individuals and, therefore, reveal their person-specific sensitive information that was not available before the mashup. 3) The mashup data from multiple sources often contain many data attributes. When enforcing a traditional privacy model, such as K-anonymity, the high-dimensional data would suffer from the problem known as the curse of high dimensionality, resulting in useless data for further data analysis. In this paper, we study and resolve a privacy problem in a real-life mashup application for the online advertising industry in social networks, and propose a service-oriented architecture along with a privacy-preserving data mashup algorithm to address the aforementioned challenges. Experiments on reallife data suggest that our proposed architecture and algorithm is effective for simultaneously preserving both privacy and information utility on the mashup data. To the best of our knowledge, this is the first work that integrates high-dimensional data for mashup service
    corecore